-
-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize dropping in bevy_ecs #2897
Conversation
// SAFETY: The pointer points to a valid array of type `[T; len]` and it is safe to drop this value. | ||
unsafe fn drop_multiple<T>(x: *mut u8, len: usize) { | ||
for i in 0..len as isize { | ||
x.cast::<T>().offset(i).drop_in_place() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not a free function in the blob_vec
module?
It's repeated here and as an associated function of ComponentDescriptor
// SAFETY: The pointer points to a valid array of type `[T; len]` and it is safe to drop this value. | ||
unsafe fn drop_ptr_multiple<T>(x: *mut u8, len: usize) { | ||
for i in 0..len as isize { | ||
x.cast::<T>().offset(i).drop_in_place() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should use ptr::add
instead, to avoid the isize dance.
x.cast::<T>().drop_in_place() | ||
// SAFETY: The pointer points to a valid array of type `[T; len]` and it is safe to drop this value. | ||
unsafe fn drop_ptr_multiple<T>(x: *mut u8, len: usize) { | ||
for i in 0..len as isize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe write a manual while loop here instead? That produces less LLVM ir and as such is probably a bit faster in debug mode and takes a bit less time to compile. (all tiny bits help) If you think that is too ugly, it is fine to keep it as is though.
Are we certain that this is actually a problem that needs fixing? Has anyone benchmarked this or looked at the generated assembly? |
# Objective - We do a lot of function pointer calls in a hot loop (clearing entities in render). This is slow, since calling function pointers cannot be optimised out. We can avoid that in the cases where the function call is a no-op. - Alternative to #2897 - On my machine, in `many_cubes`, this reduces dropping time from ~150μs to ~80μs. ## Solution - Make `drop` in `BlobVec` an `Option`, recording whether the given drop impl is required or not. - Note that this does add branching in some cases - we could consider splitting this into two fields, i.e. unconditionally call the `drop` fn pointer. - My intuition of how often types stored in `World` should have non-trivial drops makes me think that would be slower, however. N.B. Even once this lands, we should still test having a 'drop_multiple' variant - for types with a real `Drop` impl, the current implementation is definitely optimal.
# Objective - We do a lot of function pointer calls in a hot loop (clearing entities in render). This is slow, since calling function pointers cannot be optimised out. We can avoid that in the cases where the function call is a no-op. - Alternative to bevyengine#2897 - On my machine, in `many_cubes`, this reduces dropping time from ~150μs to ~80μs. ## Solution - Make `drop` in `BlobVec` an `Option`, recording whether the given drop impl is required or not. - Note that this does add branching in some cases - we could consider splitting this into two fields, i.e. unconditionally call the `drop` fn pointer. - My intuition of how often types stored in `World` should have non-trivial drops makes me think that would be slower, however. N.B. Even once this lands, we should still test having a 'drop_multiple' variant - for types with a real `Drop` impl, the current implementation is definitely optimal.
# Objective - We do a lot of function pointer calls in a hot loop (clearing entities in render). This is slow, since calling function pointers cannot be optimised out. We can avoid that in the cases where the function call is a no-op. - Alternative to bevyengine#2897 - On my machine, in `many_cubes`, this reduces dropping time from ~150μs to ~80μs. ## Solution - Make `drop` in `BlobVec` an `Option`, recording whether the given drop impl is required or not. - Note that this does add branching in some cases - we could consider splitting this into two fields, i.e. unconditionally call the `drop` fn pointer. - My intuition of how often types stored in `World` should have non-trivial drops makes me think that would be slower, however. N.B. Even once this lands, we should still test having a 'drop_multiple' variant - for types with a real `Drop` impl, the current implementation is definitely optimal.
@TheRawMeatball if you're up for it, benchmarks on this would be great. If you're busy with other stuff just let me know and we can mark this as abandoned for someone else to experiment with. |
for i in 0..len as isize { | ||
x.cast::<T>().offset(i).drop_in_place() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't this just be std::slice::from_raw_parts_mut(x.cast::<T>(), len).drop_in_place()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It did have to be the std::ptr::
counterpart, but i think that could work.
# Objective - We do a lot of function pointer calls in a hot loop (clearing entities in render). This is slow, since calling function pointers cannot be optimised out. We can avoid that in the cases where the function call is a no-op. - Alternative to bevyengine#2897 - On my machine, in `many_cubes`, this reduces dropping time from ~150μs to ~80μs. ## Solution - Make `drop` in `BlobVec` an `Option`, recording whether the given drop impl is required or not. - Note that this does add branching in some cases - we could consider splitting this into two fields, i.e. unconditionally call the `drop` fn pointer. - My intuition of how often types stored in `World` should have non-trivial drops makes me think that would be slower, however. N.B. Even once this lands, we should still test having a 'drop_multiple' variant - for types with a real `Drop` impl, the current implementation is definitely optimal.
Adopted! |
Previously, the type-erased drop impl was for dropping a single element, which meant a large number of fn pointer calls and having to iterate each component even for no-op drops. This PR moves the iteration to the generic part, which should enable noop drops to get optimized much further.